Question we want to solve:

Given first two weeks’ transactions, can we able to perdict a custom will have First Payment Default issue or not?

  1. First two weeks’ transactions:

  2. First Payment Default:

1. Load data from file:

trans1 <- read.csv('Data/April Vintage Transactions page 1.csv', stringsAsFactors=FALSE)
trans2 <- read.csv('Data/April Vintage Transactions page 2.csv', stringsAsFactors=FALSE)
trans2$DebtDimId <- as.character(trans2$DebtDimId)
trans3 <- read.csv('Data/April Vintage Transactions page 3.csv', stringsAsFactors=FALSE)
trans3$DebtDimId <- as.character(trans3$DebtDimId)
trans4 <- read.csv('Data/April Vintage Transactions page 4.csv', stringsAsFactors=FALSE)
trans4$DebtDimId <- as.character(trans4$DebtDimId)

trans.dat <- bind_rows( bind_rows(trans1,trans2),bind_rows(trans3,trans4) )
nrow(trans.dat)
## [1] 197271
#head(trans.dat)
#summary(trans.dat)

profile1 <- read.csv('Data/April Vintage 2013.csv', stringsAsFactors = FALSE)
profile2 <- read.csv('Data/April Vintage 2013 _ page 2.csv', stringsAsFactors = FALSE)
prof.dat <- bind_rows(profile1, profile2)
nrow(prof.dat)
## [1] 173253
#head(prof.dat)
#summary(prof.dat)


rm(trans1, trans2, trans3, trans4, profile1, profile2)

2. First Payment Default members:

The statement date of this card is fixed, which means some member will have a very short ‘first month’. Therefore, I used another way to define the First Payment Default:

  1. Month on book less than 2

  2. DisplayMinPay is greater than ActualMinPay

2.1 All unique member in our data set

length(unique(prof.dat$DebtDimId))
## [1] 7039

2.2 Member has MOB 0 or 1 data:

length(prof.dat$DebtDimId[which(prof.dat$MOB==0)])
## [1] 6660
length(prof.dat$DebtDimId[which(prof.dat$MOB==1)])
## [1] 7031

2.3 Apply our filter

MOB.0 <- prof.dat %>%
    filter(MOB == 0) %>%
    filter(DisplayMinPay > ActualMinPay) 

MOB.1 <- prof.dat %>%
    filter(MOB == 1) %>%
    filter(DisplayMinPay > ActualMinPay) 


nrow(MOB.0)
## [1] 167
nrow(MOB.1)
## [1] 2042

2.4 Members we believe having FPD

ProblemId <- unique(c(MOB.0$DebtDimId, MOB.1$DebtDimId))
length(ProblemId)
## [1] 2069

3. Transaction day after getting card

We consider time into our model as an very important factor. There are many way to using the transaction time data.

In this report, I’m using the day difference between member getting card and each transaction day as the time variable.

3.1 Date of member getting card: First transaction date vs. Activation date

Most credit cards are required active when user recieve the card, however, this card is actived when being issued. Therefore, activation date does not means the date member recieve the card.

In our case, we are interested in how much risk a member will owe our money. If a member never using our card, it has a lower risk they owe our money - They may still owe the membership fee, but at least, they are not stealing money from us.

Therefore, I decide to use First transaction date to indicate the day a member get their card.

trans.dat$TimeDimDt <- as.Date(trans.dat$TimeDimDt, "%m/%d/%Y")
trans.first.date <- trans.dat %>%
  group_by(DebtDimId) %>%
  summarise(firstDate = min(TimeDimDt))

nrow(trans.first.date)
## [1] 9369

Compare with previous section, we can see there are more member in transaction data table than in user profile table.

3.2 Calculate Transaction day

Transaction day = Transaction date - Date of member getting card

trans <- trans.dat %>% 
  full_join(trans.first.date, by='DebtDimId') %>%
  mutate(DayAfterGetCard = as.numeric(TimeDimDt - firstDate))

3.3 Legacy account

These account have been here for a while, we cannot find their first transaction

legacyId <- prof.dat %>% 
  group_by(DebtDimId) %>%
  summarise(minMOB = min(MOB)) %>%
  filter(minMOB > 1) %>%
  select(DebtDimId)

trans <- trans %>% 
  filter(! DebtDimId %in% legacyId ) 

length(legacyId)
## [1] 1

3.4 Transaction of first two weeks

f.2.trans <- trans[trans$DayAfterGetCard <= 14,]
f.2.TransCount <- f.2.trans %>%
  group_by(DebtDimId) %>%
  summarise(rowNum = n())
f.2.max <- max(f.2.TransCount$rowNum)

f.2.max
## [1] 70

4. Training variable and Target variable selection

I’m using transaction day (interger), transaction amount (numeric) and MCCcode (factor) as the training variables.

For the target variable, I made it up based on member FPD or not. For member who has FPD issue, I set the target variable as 999; for member who does not have FPD issue, I set the target variable as 0.

All data explorer below are based on the first 14 days transactions after getting the card.

4.1 MCC Explorer

MCCcode vs MCCCategory and MCCDescription

MCC.GourpByCode <- f.2.trans %>%
  group_by(MCCcode) %>%
  summarise(MCCcategoryCount = n_distinct(MCCCategory), MCCDescriptionCount = n_distinct(MCCDescription))

max(MCC.GourpByCode$MCCcategoryCount)
## [1] 1
max(MCC.GourpByCode$MCCDescriptionCount)
## [1] 1
MCC.GroupByDescription <- f.2.trans %>%
  group_by(MCCDescription) %>%
  summarise(MCCcategoryCount = n_distinct(MCCCategory), MCCCodeCount = n_distinct(MCCcode))

max(MCC.GroupByDescription$MCCcategoryCount)
## [1] 2
max(MCC.GroupByDescription$MCCCodeCount)
## [1] 4
MCC.GourpByCategory <- f.2.trans %>%
  group_by(MCCCategory) %>%
  summarise(MCCDescriptionCount = n_distinct(MCCDescription), MCCCodeCount = n_distinct(MCCcode))  

max(MCC.GourpByCategory$MCCDescriptionCount)
## [1] 55
max(MCC.GourpByCategory$MCCCodeCount)
## [1] 58

MCCcode to Description and Category

MCC.Code.to.Desc.Cate <- f.2.trans %>%
  group_by(MCCcode) %>%
  slice(1) %>%
  ungroup()

MCCcode frequency

mcc.feq <- f.2.trans %>% 
  group_by(MCCcode) %>% 
  summarise( MCCcodeFeq = n() ) %>%
  select(MCCcode, MCCcodeFeq) %>%
  left_join(MCC.Code.to.Desc.Cate, by='MCCcode') %>%
  select(MCCcode, MCCcodeFeq, MCCDescription)
 
plot_ly( mcc.feq, labels = ~MCCDescription, values = ~MCCcodeFeq, textinfo = 'label+percent', type = 'pie') %>% layout(showlegend = FALSE)

### MCCcode transaction ammount ###

mcc.amount <- f.2.trans %>% 
  group_by(MCCcode) %>% 
  summarise( MCCcodeAmount = sum(TransactionAmt) )%>%
  select(MCCcode, MCCcodeAmount) %>%
  left_join(MCC.Code.to.Desc.Cate, by='MCCcode') %>%
  select(MCCcode, MCCcodeAmount, MCCDescription)
 
plot_ly( mcc.amount, labels = ~MCCDescription, values = ~MCCcodeAmount, textinfo = 'label+percent', type = 'pie') %>% layout(showlegend = FALSE)

MCCCode amount per transaction

mcc.amount.pertrans <- mcc.feq %>%
  select(MCCcode, MCCcodeFeq) %>%
  left_join(mcc.amount, by='MCCcode') %>%
  mutate(amountPerTrans = MCCcodeAmount / MCCcodeFeq)
  

plot_ly( mcc.amount.pertrans, labels = ~MCCDescription, values = ~amountPerTrans, textinfo = 'label+percent', type = 'pie') %>% layout(showlegend = FALSE)

4.2 Trans Day Explorer

Day vs. frequency

day.feq <- f.2.trans %>% 
  group_by(DayAfterGetCard) %>% 
  summarise( DayFeq = n() ) %>% 
  arrange(DayAfterGetCard)

plot_ly( day.feq, x = ~DayAfterGetCard, y = ~DayFeq, type='bar')

Day vs. amount

day.amount <- f.2.trans %>% 
  group_by(DayAfterGetCard) %>% 
  summarise( DayAmount = sum(TransactionAmt) ) %>% 
  arrange(DayAfterGetCard)

plot_ly( day.amount, x = ~DayAfterGetCard, y = ~DayAmount, type='bar')
## Warning: Ignoring 1 observations

Day vs. amount per transaction

day.amount.pertrans <- day.feq %>%
  left_join(day.amount, by='DayAfterGetCard') %>%
  mutate(AmountPerTrans = DayAmount / DayFeq) %>%
  arrange(DayAfterGetCard)

plot_ly( day.amount.pertrans, x = ~DayAfterGetCard, y = ~AmountPerTrans, type='bar')
## Warning: Ignoring 1 observations

### Day vs. amount - box plot_ly ###

plot_ly( f.2.trans, x = ~DayAfterGetCard, y = ~TransactionAmt, type = 'box')
## Warning: Ignoring 2 observations

4.3 Account Infomation

Open Date

Open Limit

Current Limit

Current State (Location)

FPD rate

length(unique(prof.dat$DebtDimId)) / length(ProblemId)
## [1] 3.402127

4.4 Combine Transaction info and Account Info

4.4.1 Good/Bad Account

AllId <- unique(prof.dat$DebtDimId)
SafeId <- AllId[! AllId %in% ProblemId ]

4.4.2 Transaction Frequency in different account by MCC

MCC.Feq.Good <- f.2.trans %>% 
  filter( DebtDimId %in% SafeId ) %>%
  group_by(MCCcode) %>% 
  summarise( MCCcodeFeqGood = n() ) 

MCC.Feq.FPD <- f.2.trans %>% 
  filter( DebtDimId %in% ProblemId ) %>%
  group_by(MCCcode) %>% 
  summarise( MCCcodeFeqFPD = n() )

MCC.Feq.All <- full_join(MCC.Feq.Good, MCC.Feq.FPD, by='MCCcode')


MCC.Feq.All <- MCC.Feq.All %>%
  select(MCCcode, MCCcodeFeqFPD, MCCcodeFeqGood ) %>%
  left_join(MCC.Code.to.Desc.Cate, by='MCCcode')

Bar plot via abs value

plot_ly( MCC.Feq.All, x = ~MCCDescription, y =~MCCcodeFeqFPD, type='bar', name='FPD Accounts') %>%
  add_trace(y = ~MCCcodeFeqGood, name = 'Good Accounts') %>%
  layout(yaxis = list(title = 'Feq'), barmode = 'stack')
## Warning: Ignoring 63 observations
## Warning: Ignoring 25 observations

Pie Plot via ratio (Percentage of percentage)

MCC.Feq.All <- MCC.Feq.All %>%
  mutate(FeqAll = MCCcodeFeqGood+MCCcodeFeqFPD) %>%
  mutate(RiskRatio = MCCcodeFeqFPD/FeqAll )

plot_ly( MCC.Feq.All, labels = ~MCCDescription, values = ~RiskRatio, textinfo = 'label+percent', type = 'pie') %>% layout(showlegend = FALSE)

4.4.3 Transaction Amount in different account by MCC

MCC.Amount.Good <- f.2.trans %>% 
  filter( DebtDimId %in% SafeId ) %>%
  group_by(MCCcode) %>% 
  summarise( MCCcodeAmountGood = sum(TransactionAmt) ) 

MCC.Amount.FPD <- f.2.trans %>% 
  filter( DebtDimId %in% ProblemId ) %>%
  group_by(MCCcode) %>% 
  summarise( MCCcodeAmountFPD = sum(TransactionAmt) )

MCC.Amount.All <- full_join(MCC.Amount.Good, MCC.Amount.FPD, by='MCCcode')


MCC.Amount.All <- MCC.Amount.All %>%
  select(MCCcode, MCCcodeAmountFPD, MCCcodeAmountGood ) %>%
  left_join(MCC.Code.to.Desc.Cate, by='MCCcode')

Bar plot via abs value

plot_ly( MCC.Amount.All, x = ~MCCDescription, y =~MCCcodeAmountFPD, type='bar', name='FPD Accounts') %>%
  add_trace(y = ~MCCcodeAmountGood, name = 'Good Accounts') %>%
  layout(yaxis = list(title = 'Amount'), barmode = 'stack')
## Warning: Ignoring 64 observations
## Warning: Ignoring 25 observations

#### Pie Plot via ratio (Percentage of percentage) ####

MCC.Amount.All <- MCC.Amount.All %>%
  mutate(AmountAll = MCCcodeAmountGood+MCCcodeAmountFPD) %>%
  mutate(RiskRatio = MCCcodeAmountFPD/ AmountAll)

plot_ly( MCC.Amount.All, labels = ~MCCDescription, values = ~RiskRatio, textinfo = 'label+percent', type = 'pie') %>% layout(showlegend = FALSE)

5. Model

5.1 Explainable Model

Trade each transaction as a increment of chance to FPD.

Believe each transaction are individual.

Then we can fomulize the model as:

\[ChanceToFPD = \sum_i A_iC_i e^{(-C_t+t_i)}\]

In which, \(A_i\) denote the amount of transaction \(i\), \(C_i\) denote the affect ratio of MCC code of transaction \(i\), \(t_i\) denote the time of transaction \(i\) happens, \(C_t\) is a constant to centralize the data.

Pros: Easy to explain, easy to build.

Cons: 1. This model believe each transaction are individual, which may not correct 2. This model cannot take advantage in big data

5.2 Deep Neural Network

CNN or RNN both able to handle this problem.

Common CNN are easier to image and impletment, but hard to explain and optimize.

RNN will more reasonable, but I need to learn how to build up such a RNN system.

6. Discussion

  1. Impletment models
  2. FICO Score
  3. Clean up MCC code description